Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit
نویسندگان
چکیده
Emotional expression is a key requirement for intelligent virtual agents. In order for an agent to produce dynamic spoken content speech synthesis is required. However, despite substantial work with prerecorded prompts, very little work has explored the combined effect of high quality emotional speech synthesis and facial expression. In this paper we offer a baseline evaluation of the naturalness and emotional range available by combining the freely available SmartBody component of the Virtual Human Toolkit (VHTK) with CereVoice text to speech (TTS) system. Results echo previous work using pre-recorded prompts, the visual modality is dominant and the modalities do not interact. This allows the speech synthesis to add gradual changes to the perceived emotion both in terms of valence and activation. The naturalness reported is good, 3.54 on a 5 point MOS scale.
منابع مشابه
Synthesising and Evaluating Cross-Modal Emotional Ambiguity in Virtual Agents
Emotional ambiguity, when more than one emotion appears present at a given time, or several emotions are superimposed, is common in human interaction and effects such as irony can be intentionally created through a mismatch of such emotional signals. High quality emotional speech synthesis offers a means for testing the effect of combining differences in vocal emotion, facial expression and tex...
متن کاملEmotional speech synthesis for emotionally-rich virtual worlds
This paper aims to give a brief overview of the current state of the art in emotional speech synthesis in view of a multi-modal context. After a brief introduction into the concept of text-to-speech synthesis, two approaches to the expression of emotions in speech synthesis are described. The categorical approach models emotions as discrete categories and is able to provide high-quality emotion...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملCommand Speech Interface to Virtual Reality Applications
During last five years several attempts to develop the speech interface to especially simulation applications emerged due to the recent improvements in speech and language technology and the complexity of those application’s interfaces. We describe our approach to control Virtual Reality applications via voice and GUI, in creation of simple multimodal command speech interface based on dialog mo...
متن کاملAnthropomorphic Agent as an Integrating Platform of Audio-Visual Information
One of ultimate human-machine interfaces is anthropomorphic spoken dialog agent which behaves like humans with facial animation and gesture and make speech conversations with humans. Among numerous efforts devoted for such a goal, Galatea Project conducted by 17 members from 12 universities is developing an open-source license-free software toolkit [1] for building an anthropomorphic spoken dia...
متن کامل